12 research outputs found

    Source-to-Source Automatic Differentiation of OpenMP Parallel Loops

    Get PDF
    International audienceThis paper presents our work toward correct and efficient automatic differentiation of OpenMP parallel worksharing loops in forward and reverse mode. Automatic differentiation is a method to obtain gradients of numerical programs, which are crucial in optimization, uncertainty quantification, and machine learning. The computational cost to compute gradients is a common bottleneck in practice. For applications that are parallelized for multicore CPUs or GPUs using OpenMP, one also wishes to compute the gradients in parallel. We propose a framework to reason about the correctness of the generated derivative code, from which we justify our OpenMP extension to the differentiation model. We implement this model in the automatic differentiation tool Tapenade and present test cases that are differentiated following our extended differentiation procedure. Performance of the generated derivative programs in forward and reverse mode is better than sequential, although our reverse mode often scales worse than the input programs

    Discrete adjoints on many cores Algorithmic differentiation of accelerated fluid simulations

    Get PDF
    PhDSimulations are used in science and industry to predict the performance of technical systems. Adjoint derivatives of these simulations can reveal the sensitivity of the system performance to changes in design or operating conditions, and are increasingly used in shape optimisation and uncertainty quantification. Algorithmic differentiation (AD) by source-transformation is an efficient method to compute such derivatives. AD requires an analysis of the computation and its data flow to produce efficient adjoint code. One important step is the activity analysis that detects operations that need to be differentiated. An improved activity analysis is investigated in this thesis that simplifies build procedures for certain adjoint programs, and is demonstrated to improve the speed of an adjoint fluid dynamics solver. The method works by allowing a context-dependent analysis of routines. The ongoing trend towards multi- and many-core architectures such as the Intel XeonPhi is creating challenges for AD. Two novel approaches are presented that replicate the parallelisation of a program in its corresponding adjoint program. The first approach detects loops that naturally result in a parallelisable adjoint loop, while the second approach uses loop transformation and the aforementioned context-dependent analysis to enforce parallelisable data access in the adjoint loop. A case study shows that both approaches yield adjoints that are as scalable as their underlying primal programs. Adjoint computations are limited by their memory footprint, particularly in unsteady simulations, for which this work presents incomplete checkpointing as a method to reduce memory usage at the cost of a slight reduction in accuracy. Finally, convergence of iterative linear solvers is discussed, which is especially relevant on accelerator cards, where single precision floating point numbers are frequently used and the choice of solvers is limited by the small memory size. Some problems that are particular to adjoint computations are discussed.European Union

    Automatic Differentiation of Parallel Loops with Formal Methods

    Get PDF
    International audienceThis paper presents a novel combination of reverse mode automatic differentiation and formal methods, to enable efficient differentiation of (or backpropagation through) shared-memory parallel loops. Compared to the state of the art, our approach can reduce the need for atomic updates or private data copies during the parallel derivative computation, even in the presence of unstructured or data-dependent data access patterns. This is achieved by gathering information about the memory access patterns from the input program, which is assumed to be correctly parallelized. This information is then used to build a model of assertions in a theorem prover, which can be used to check the safety of shared memory accesses during the parallel derivative loops. We demonstrate this approach on scientific computing benchmarks including a lattice-Boltzmann method (LBM) solver from the Parboil benchmark suite and a Green's function Monte Carlo (GFMC) kernel from the CORAL benchmark suite

    Source-to-Source Automatic Differentiation of OpenMP Parallel Loops

    Get PDF
    International audienceThis paper presents our work toward correct and efficient automatic differentiation of OpenMP parallel worksharing loops in forward and reverse mode. Automatic differentiation is a method to obtain gradients of numerical programs, which are crucial in optimization, uncertainty quantification, and machine learning. The computational cost to compute gradients is a common bottleneck in practice. For applications that are parallelized for multicore CPUs or GPUs using OpenMP, one also wishes to compute the gradients in parallel. We propose a framework to reason about the correctness of the generated derivative code, from which we justify our OpenMP extension to the differentiation model. We implement this model in the automatic differentiation tool Tapenade and present test cases that are differentiated following our extended differentiation procedure. Performance of the generated derivative programs in forward and reverse mode is better than sequential, although our reverse mode often scales worse than the input programs

    Model Checking Race-freedom When "Sequential Consistency for Data-race-free Programs" is Guaranteed

    Full text link
    Many parallel programming models guarantee that if all sequentially consistent (SC) executions of a program are free of data races, then all executions of the program will appear to be sequentially consistent. This greatly simplifies reasoning about the program, but leaves open the question of how to verify that all SC executions are race-free. In this paper, we show that with a few simple modifications, model checking can be an effective tool for verifying race-freedom. We explore this technique on a suite of C programs parallelized with OpenMP

    Automatic Differentiation for Adjoint Stencil Loops

    Full text link
    Stencil loops are a common motif in computations including convolutional neural networks, structured-mesh solvers for partial differential equations, and image processing. Stencil loops are easy to parallelise, and their fast execution is aided by compilers, libraries, and domain-specific languages. Reverse-mode automatic differentiation, also known as algorithmic differentiation, autodiff, adjoint differentiation, or back-propagation, is sometimes used to obtain gradients of programs that contain stencil loops. Unfortunately, conventional automatic differentiation results in a memory access pattern that is not stencil-like and not easily parallelisable. In this paper we present a novel combination of automatic differentiation and loop transformations that preserves the structure and memory access pattern of stencil loops, while computing fully consistent derivatives. The generated loops can be parallelised and optimised for performance in the same way and using the same tools as the original computation. We have implemented this new technique in the Python tool PerforAD, which we release with this paper along with test cases derived from seismic imaging and computational fluid dynamics applications.Comment: ICPP 201

    Surrogate Neural Networks to Estimate Parametric Sensitivity of Ocean Models

    Full text link
    Modeling is crucial to understanding the effect of greenhouse gases, warming, and ice sheet melting on the ocean. At the same time, ocean processes affect phenomena such as hurricanes and droughts. Parameters in the models that cannot be physically measured have a significant effect on the model output. For an idealized ocean model, we generated perturbed parameter ensemble data and trained surrogate neural network models. The neural surrogates accurately predicted the one-step forward dynamics, of which we then computed the parametric sensitivity

    Automatic differentiation of parallel loops with formal methods

    Get PDF
    International audienc

    TROPHY: Trust Region Optimization Using a Precision Hierarchy

    No full text
    We present an algorithm to perform trust-region-based optimization for nonlinear unconstrained problems. The method selectively uses function and gradient evaluations at different floating-point precisions to reduce the overall energy consumption, storage, and communication costs; these capabilities are increasingly important in the era of exascale computing. In particular, we are motivated by a desire to improve computational efficiency for massive climate models. We employ our method on two examples: the CUTEst test set and a large-scale data assimilation problem to recover wind fields from radar returns. Although this paper is primarily a proof of concept, we show that if implemented on appropriate hardware, the use of mixed-precision can significantly reduce the computational load compared with fixed-precision solvers.Comment: 14 pages, 2 figures, 2 table
    corecore